ai-chat: architecture cleanup, bug fixes, storage guards, new features, docs rewrite#899
ai-chat: architecture cleanup, bug fixes, storage guards, new features, docs rewrite#899threepointone merged 14 commits intomainfrom
Conversation
🦋 Changeset detectedLatest commit: 54bcc0f The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
commit: |
Large refactor and feature update for the ai-chat package: extracted ResumableStream, simplified SSE parsing (message-builder), added WebSocket chat transport, and organized deprecated APIs. Adds many new tests (unit, react, and Playwright E2E), new examples (examples/ai-chat), and new docs (chat-agents.md plus updates to human-in-the-loop, client-tools-continuation, resumable-streaming, and more). Introduces features like incremental persistence, row-size guard, maxPersistedMessages, top-level body option for useAgentChat, and improved tool streaming/continuation flows. Misc: package.json updates and .gitignore entry for Playwright test-results.
96eaea5 to
a278f92
Compare
|
/bonk review this PR |
This comment was marked as resolved.
This comment was marked as resolved.
Several fixes to make streaming more robust and update E2E test metadata: - e2e/llm.spec.ts: Update test description to reflect Workers AI usage (AI binding in wrangler.jsonc) and note that no OpenAI API key is required; BadKeyAgent remains for invalid-key error handling. - src/index.ts: Add TODO and comment explaining a race where the completion promise can be null and a 500ms fallback is used; suggests a more deterministic signal. - src/resumable-stream.ts: Use a local streamId snapshot and guard against it being undefined; replace the buggy self-comparison with a defensive check to detect if the active stream changed between lookup and replay. - src/ws-chat-transport.ts: Introduce a `completed` flag to avoid handling aborts after the stream finished; set `completed = true` on error and done to prevent spurious abort handling. These changes address race conditions around stream lifecycle, make intent clearer, and prevent duplicate/late abort handling.
|
/bonk I addressed your review feedback, have another look |
This comment was marked as resolved.
This comment was marked as resolved.
Delete pr.md containing the large PR draft and changelog notes. Removes an obsolete/draft documentation file from the repository.
Add accurate UTF-8 byte-length measurement and apply it to row/ chunk size guards to prevent SQLite row-limit issues. Preserve the latest custom request body for tool continuations and forward it to continuations. Harden streaming cleanup: always clear streaming references, resolve completion promises on errors, remove abort controllers, and emit observability. Improve resumable stream handling (skip oversized chunks using byte size, clear in-memory chunk buffer on clearAll, and include errored streams in periodic cleanup). Add a fallback to create reasoning parts when reasoning-delta arrives without a start. Update tests to use ws.close(1000), add tests for buffer clearing, errored-stream cleanup, and multi-byte Unicode byte-length behavior, and expose small TestChatAgent helpers for testing stream cleanup and insertion of old errored streams.
|
/bonk I addressed your view feedback again, please review again |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
In packages/ai-chat/src/index.ts: add a cached TextEncoder (_encoder) and use it in _byteLength to avoid allocating a new encoder on every call (performance). Use this._lastBody as the request body instead of conditionally sending customBody. Adjust streaming cleanup to always remove the abort controller but only emit observability events when the stream completed successfully (prevents emitting on error paths). These changes reduce allocations and tighten observable emission semantics.
|
/bonk I pushed some changes, do another review please |
1 similar comment
|
/bonk I pushed some changes, do another review please |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
Add a module-level TextEncoder in packages/ai-chat/src/index.ts and packages/ai-chat/src/resumable-stream.ts and switch byte-length measurements to use it. Removes the AIChatAgent._encoder and avoids creating new TextEncoder instances on each measurement, reducing allocations when enforcing SQLite row and chunk size limits.
Pass explicit normal close code (1000) to ws.close() calls in packages/ai-chat/src/tests/client-tools-reconnect.test.ts. This makes the WebSocket close behavior explicit in the tests and avoids ambiguous/overloaded calls during test teardown.
|
Ok I've addressed all feedback, this is I bc pretty good shape. Once I fix the changeset problem, I'm going to merge this in |
|
This is throwing infinite loop useEffect in playground, will debug and fix later |
Replace occurrences of the Workers AI model '@cf/openai/gpt-oss-120b' with '@cf/zai-org/glm-4.7-flash' across docs, examples, guides, and e2e tests. Also bump .changeset schema to @changesets/config v3.1.2 and relax @cloudflare/ai-chat peerDependency in packages/agents to allow ^0.1.0. Adjust README and test comments to reflect the new model.
Memoize createStubProxy in useAgent so the agent.stub is referentially stable across renders (useMemo). Update default model for the AI playground (client and server) from @cf/qwen/qwen3-30b-a3b-fp8 to @cf/zai-org/glm-4.7-flash. Also update the changeset to list multiple bug fixes for AIChatAgent (stream resumption, error handling, chunk buffer clearing, UTF-8 row size guard, GC of errored streams, and related stream completion fixes).
|
/bonk do another review |
This comment was marked as resolved.
This comment was marked as resolved.
When a text-delta is received without a preceding text-start, create the fallback text part with state="streaming" and update tests to verify the streaming state and that a subsequent text-end marks it as "done". Also wrap the waitForStream() continuation flow in a .catch to log tool continuation failures and tidy the continuation reply flow (ensure continuation flag/chatMessageId are passed into _reply). These changes improve stream resumption handling and prevent unhandled promise rejections during tool continuations.
|
/bonk one more review please. |
This comment was marked as resolved.
This comment was marked as resolved.
Return the value (promise) from this._tryCatchChat(...) so the caller receives and can await the async operation. Previously the call wasn't returned which could drop the promise/return value and break propagation of the agent continuation/abort flow.
|
/bonk one more review, tell me if this is ready to land |
Co-authored-by: threepointone <threepointone@users.noreply.github.com>
|
Here's my review: Final Review (commit 93e1bbb)All prior review items: resolvedEvery item flagged across the 4 previous review rounds is addressed:
Build and tests
Remaining low-severity items (not blocking)
VerdictThis PR is ready to land. All high and medium severity issues from 4 review rounds are fixed. The code is sound, the architecture extraction is clean, the bug fixes are real and well-tested, and the 136 Workers tests pass. The remaining items are all low-severity nits suitable for follow-up work. |
Update package-lock.json to add "peer": true to numerous dependency entries, remove/adjust a few misplaced peer flags, and fix minor license formatting. This standardizes peer metadata across the lockfile to ensure consistent dependency metadata.
Description
TL;DR
AIChatAgent-- extracted stream resumption, WebSocket transport, and message building into focused modulessetMessagesdata loss, stream resumption race condition (Stream resumption loses reasoning/thinking state after page refresh #896), SQLite crashes on large tool outputs, and tool continuation hangs on errormaxPersistedMessages,bodyoption, incremental persistence, and automatic row size compactiononFinishoptional -- framework handles cleanup automaticallyagent.stubinuseAgentso it is referentially stable across rendersAll changes are backward compatible. No public API was removed.
Motivation
Users were hitting crashes from SQLite's 2MB row limit on large tool outputs, stream resumption had a race condition where reconnecting clients missed the resume notification (#896), and
setMessageswith a functional updater silently lost data. These bugs were difficult to isolate because the codebase had no separation of concerns --index.tswas 2,286 lines with stream resumption, chunk buffering, SSE parsing, tool handling, message persistence, and the WebSocket protocol all interleaved in one class.On the client side,
useAgentChatcreated a fakefetchcallback that assembled aResponseobject from WebSocket messages, then fed it into the AI SDK'sDefaultChatTransportwhich re-parsed the SSE. Every chunk was serialized, deserialized, re-serialized, and re-deserialized.Deprecated APIs from the v4/v5 era were mixed in with current code, with no deprecation warnings or clear boundaries. Test coverage was minimal -- no e2e tests, limited unit tests, no React hook tests. Documentation showed outdated patterns that no longer worked correctly with AI SDK v6.
What changed
Architecture (no behavior change)
Extracted three focused modules from the monolithic
AIChatAgentclass:resumable-stream.tsws-chat-transport.tsChatTransport<UIMessage>for the AI SDKmessage-builder.tsUIMessageparts from stream chunks (shared server/client)index.tswent from 2,286 to ~1,760 lines.react.tsxwent from 1,461 to ~1,350 lines. The total line count is similar but the code is now modular and testable.Bug fixes (12)
setMessagesfunctional updater data loss --setMessages(prev => [...prev, msg])sent an empty array to the server because the wrapper did not resolve the function before syncing._sendPlaintextReplycreating multiple text parts -- each network chunk became a separatetextpart in the message instead of accumulating into one.JSON.parse(undefined)threw an uncaughtSyntaxErrorwhen clients sent malformed requests.CF_AGENT_MESSAGE_UPDATEDnot broadcast for streaming messages -- tool results applied during an active stream were silently swallowed instead of being broadcast to other connections.CF_AGENT_STREAM_RESUMINGinonConnectbefore the client's message handler was registered. Fixed with client-initiatedCF_AGENT_STREAM_RESUME_REQUESTprotocol andreplayflag on buffered chunks._streamCompletionPromisenot resolved on error -- if a stream errored, the completion promise was never resolved, causing tool continuations waiting on it to hang forever. Moved cleanup intofinallyblock.bodylost during tool continuations -- customoptions.bodydata was not passed through toonChatMessageduring auto-continuation. Now stored alongside_lastClientTools.clearAll()not clearing chunk buffer -- ifclearAll()was called while chunks were buffered in memory, the nextflushBuffer()would write orphaned chunks to freshly-cleared tables.status = 'completed'streams. Errored streams persisted indefinitely.reasoning-deltadropping data withoutreasoning-start-- unliketext-deltawhich had a fallback,reasoning-deltasilently discarded data if no matching reasoning part existed. Affects stream resumption wherereasoning-startwas missed.string.lengthinstead of byte count --JSON.stringify().lengthmeasures UTF-16 code units, not UTF-8 bytes. Multi-byte Unicode (CJK, emoji) could exceed SQLite's 2MB byte limit while passing the character-length check. Now usesTextEncoderfor accurate measurement.CF_AGENT_CHAT_REQUEST_CANCEL. Addedcompletedflag guard.New features
maxPersistedMessages-- cap SQLite message count. Oldest messages deleted after each persist. Default: unlimited (backward compatible).bodyoption onuseAgentChat-- send custom data with every request. Accepts static objects or functions (sync/async). Available inonChatMessageviaoptions.body.compactedToolOutputs/compactedTextParts) so clients can detect compaction.ResumableStreamskips storing chunks over 1.8MB (still broadcast to live clients, just not persisted for replay).onFinishmade optional -- abort controller cleanup and observability emit moved into the framework's stream completion handler:agentspackage fixesagent.stubnow referentially stable --createStubProxy(call)was creating a newProxyon every render, causing infinite loops whenagent.stubwas used in dependency arrays. Now memoized withuseMemo.cacheInvalidatedAtlint fix -- referenced insideuseMemobody to make the cache-buster dependency explicit.@cloudflare/ai-chatpeer dep --"^0.0.8"to"^0.0.8 || ^0.1.0"to prevent changeset cascade to major bumps.Deprecations
All deprecated APIs now emit a one-time
console.warnon first use, have@deprecatedJSDoc, and are marked with// -- DEPRECATED --section banners. Removal planned for next major.Server:
createToolsFromClientSchemas()Client:
extractClientToolSchemas(),detectToolsRequiringConfirmation(),tools,toolsRequiringConfirmation,experimental_automaticToolResolution,autoSendAfterAllConfirmationsResolved,addToolResult()Migration:
migrateToUIMessage(),migrateMessagesToUIFormat(),needsMigration(),analyzeCorruption()Docs
docs/chat-agents.md-- comprehensive reference forAIChatAgentanduseAgentChat(673 lines). Covers server API, client API, all three tool patterns, custom request data, resumable streaming, storage management, multiple AI providers, multi-client sync, and the WebSocket protocol.README.md-- correct patterns, Workers AI examples, noonFinishboilerplate.human-in-the-loop.md-- modernneedsApproval+onToolCallpatterns.client-tools-continuation.md--autoContinueAfterToolResultwith Workers AI.resumable-streaming.md-- accuracy fixes, new protocol details, back-links.index.md-- moved resumable streaming to AI Integration section, removed TODO marker for chat-agents.Examples
examples/ai-chat/-- showcases all recommended patterns: server tools, client tools (onToolCall), tool approval (needsApproval),pruneMessages,maxPersistedMessages,bodyoption, Workers AI.@cf/zai-org/glm-4.7-flash(no API key needed).examples/resumable-stream-chat/-- updated totoUIMessageStreamResponse(), fixed CSS@sourcepath.guides/human-in-the-loop/-- rewritten to modern patterns.examples/playground/-- memoized callbacks to fix exhaustive-deps warnings.site/ai-playground/changed to@cf/zai-org/glm-4.7-flash.Tests
Notable additions:
message-builder.test.ts(34 tests) -- full chunk type coverage including tool streaming lifecycle and stream resumption fallbacksrow-size-guard.test.ts(11 tests) -- incremental persistence, compaction, Unicode byte-length, chunk guardmax-persisted-messages.test.ts(5 tests) -- storage cap enforcementonfinish-cleanup.test.ts(5 tests) -- abort controller cleanup without useronFinishresumable-streaming.test.ts-- clearAll buffer clearing, errored stream cleanupbodyoption, re-render stability,clearHistory,onToolCallDesign decisions
Why extract modules instead of rewriting from scratch?
The existing behavior is battle-tested in production. A full rewrite would risk subtle regressions in the WebSocket protocol, hibernation recovery, and stream resumption. Instead, we extracted code into modules with clear interfaces while preserving the exact same behavior. Every extraction was verified by the new test suite.
Why a native WebSocket ChatTransport?
The old approach created a fake
Responseobject from WebSocket messages so it could use the AI SDK's HTTP-basedDefaultChatTransport. This meant every chunk was serialized to SSE, wrapped in a Response, then re-parsed from SSE. The newWebSocketChatTransportimplementsChatTransport<UIMessage>directly, returning aReadableStream<UIMessageChunk>from WebSocket events. No fake fetch, no double serialization.Why compaction instead of splitting messages across rows?
SQLite rows have a 2MB hard limit. We considered splitting large messages across multiple rows, but this would have complicated every query, broken hibernation wake-up (which loads all messages), and created consistency risks. Instead, we compact in-place: large tool outputs are replaced with an LLM-friendly summary, and the metadata preserves what was compacted. The LLM still gets useful context ("this tool returned a large result that was compacted -- suggest re-running it"), and the UX degrades gracefully rather than crashing.
Why not remove deprecated APIs now?
Users are actively using the v4/v5 patterns (
tools,toolsRequiringConfirmation,experimental_automaticToolResolution). Removing them would be a breaking change. Instead, we addedconsole.warnon first use,@deprecatedJSDoc, and clear section banners. The migration path is documented inmigration-to-ai-sdk-v6.md. Removal happens in the next major.Why keep
onFinishin the signature at all?Even though framework cleanup is now automatic, users may still want
onFinishfor their own logic (logging, analytics, side effects). Making it optional rather than removing it preserves that escape hatch without forcing everyone to use it.Why one large PR instead of a series of smaller ones?
The architecture extraction, bug fixes, and new features are interconnected. For example, the row size guard depends on incremental persistence, which depends on the extracted
ResumableStreamclass. The stream resumption race fix touches bothindex.tsandreact.tsx. Splitting these into separate PRs would have created intermediate states where the code compiled but had subtle inconsistencies. The "Notes for reviewers" section below suggests a review order that makes the diff manageable.Notes for reviewers
Suggested review order:
Start with the three new extracted modules -- they are self-contained and easy to review in isolation:
packages/ai-chat/src/resumable-stream.ts-- stream chunk managementpackages/ai-chat/src/ws-chat-transport.ts-- WebSocket transportpackages/ai-chat/src/message-builder.ts-- shared chunk-to-parts logicThen review
index.ts-- the diff is large but mostly deletions from the extraction above. The remaining new code is: incremental persistence cache, row size guard (with UTF-8 byte measurement),onFinishcleanup infinally,bodypreservation for continuations, and tool continuation error handling.Then review
react.tsx-- changes are in distinct sections:WebSocketChatTransportusage (replacesaiFetch)bodyoption merging inprepareBodytoolsRequiringConfirmationmemoized withuseMemoonToolCalltype fix (omit fromUseChatParamsto avoid union)Then
packages/agents/src/react.tsx-- small change:agent.stubmemoized withuseMemo,cacheInvalidatedAtreferenced in memo body.Then docs, examples, tests -- these are straightforward to skim.
Other notes:
The e2e tests use Workers AI (
@cf/zai-org/glm-4.7-flash) -- no API key needed. The only OpenAI usage isBadKeyAgentwhich intentionally tests error handling with an invalid key.All changes are backward compatible. No public API was removed. New features (
maxPersistedMessages,body, incremental persistence, row size guard) are opt-in or automatic with no behavior change for existing users.The playground lint fixes (
ChatRoomsDemo,SupervisorDemo,SqlDemo,ScheduleDemo,useLogs) are unrelated to ai-chat but fix pre-existing exhaustive-deps warnings that showed up innpm run check.Deferred
These were considered during the refactor but intentionally left for follow-up work:
Decompose
useAgentChatinto smaller hooks. The hook is ~900 lines with tool resolution, stream resumption, message sync, and transport setup all in one function. It should be split into composable hooks (useStreamResumption,useToolResolution, etc.), but doing so in this PR would have made the diff even larger and harder to review.Remove deprecated APIs. The v4/v5 client tool patterns (
tools,toolsRequiringConfirmation,experimental_automaticToolResolution,addToolResult, etc.) are still used by existing apps. This PR adds deprecation warnings and JSDoc; actual removal happens in the next major version.Revisit constructor monkey-patching.
AIChatAgentwraps lifecycle methods (onConnect,onClose,onMessage) in the constructor so users do not have to callsuper. This is a deliberate DX choice but makes the class harder to reason about. A middleware/hook pattern would be cleaner, but changing it is a breaking API change.WebSocket-based initial messages. Currently,
useAgentChatfetches initial messages via an HTTPGET /get-messagesendpoint, which requires a Suspense boundary. Sending them over the existing WebSocket connection would eliminate the HTTP call and simplify the client setup, but the interaction with React Suspense (the socket unmounts during suspend) needs careful design.N+1 deletes in
_enforceMaxPersistedMessages. Each excess message is deleted individually. A singleDELETE WHERE id IN (...)would be cleaner but the performance difference is negligible with local SQLite (no network latency).